Wavelets for intonation modeling in HMM speech synthesis

نویسندگان

Antti Suni

Daniel Aalto

Tuomo Raitio

Paavo Alku

Martti Vainio

چکیده

The pitch contour in speech contains information about different linguistic units at several distinct temporal scales. At the finest level, the microprosodic cues are purely segmental in nature, whereas in the coarser time scales, lexical tones, word accents, and phrase accents appear with both linguistic and paralinguistic functions. Consequently, the pitch movements happen on different temporal scales: the segmental perturbations are faster than typical pitch accents and so forth. In HMMbased speech synthesis paradigm, slower intonation patterns are not easy to model. The statistical procedure of decision tree clustering highlights instances that are more common, resulting in good reproduction of microprosody and declination, but with less variation on word and phrase level compared to human speech. Here we present a system that uses wavelets to decompose the pitch contour into five temporal scales ranging from microprosody to the utterance level. Each component is then individually trained within HMM framework and used in a superpositional manner at the synthesis stage. The resulting system is compared to a baseline where only one decision tree is trained to generate the pitch contour.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

HMM-Based Speech Synthesis for the Greek Language

The success and the dominance of Hidden Markov Models (HMM) in the field of speech recognition, tends to extend also in the area of speech synthesis, since HMM provide a generalized statistical framework for efficient parametric speech modeling and generation. In this work, we describe the adaption, the implementation and the evaluation of the HMM speech synthesis framework for the case of the ...

متن کامل

Comparison of chironomic stylization versus statistical modeling of prosody for expressive speech synthesis

Chironomic stylization is the process of real-time modification of intonation contours (f0 and tempo) using drawing/writing gestures with a stylus on a graphic tablet. The question addressed in this research is whether hand-made intonation stylization could improve or degrade expressivity and overall quality, compared to statistical modeling of prosody. A system for expressive TTS in French bas...

متن کامل

Maximum-likelihood dynamic intonation model for concatenative text-to-speech system

In this work we present a Maximum Likelihood (ML) joint pitch curve modeling, inspired by HMM TTS synthesis concept. This model provides an optimal solution for the coarse target intonation curve (3 points per syllable) and incorporates both static and dynamic pitch values for better utterance intonation modeling. The coarse intonation curve may be optionally combined with the original pitch ex...

متن کامل

Synthesising intonational varieties of Swedish

Within the research project SIMULEKT (Simulating Intonational Varieties of Swedish), our recent work includes two approaches to simulating intonation in regional varieties of Swedish. The first involves a method for modeling intonation using the SWING (SWedish INtonation Generator) tool, where annotated speech samples are resynthesised with rule-based intonation and audio-visually analysed with...

متن کامل

Intonation issues in HMM-based speech synthesis for Vietnamese

In an HMM-based Text-To-Speech system, contextual features, including phonetic and prosodic factors have a significant influence to the spectrum, F0 and duration of the synthetic voice. This paper proposes prosodic features aiming at improving the naturalness of an HMM-based TTS system (VTed) for a tonal language, Vietnamese. The ToBI (Tones and Break Indices) features are used to learn two cru...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2013

Wavelets for intonation modeling in HMM speech synthesis

نویسندگان

چکیده

منابع مشابه

HMM-Based Speech Synthesis for the Greek Language

Comparison of chironomic stylization versus statistical modeling of prosody for expressive speech synthesis

Maximum-likelihood dynamic intonation model for concatenative text-to-speech system

Synthesising intonational varieties of Swedish

Intonation issues in HMM-based speech synthesis for Vietnamese

عنوان ژورنال:

اشتراک گذاری